Polar codes for efficient encoding and decoding in redundant disk arrays

ABSTRACT

An improved technique applies polar codes to storage data to improve the reliability of a storage system that uses high-performance, solid-state disks as part of a RAID group for storing frequently-accessed data. Along these lines, a high-performance storage system having n solid-state disks assigns k of those disks as payload disks. The storage system partitions the payload data into a data vector that has k data symbols. The storage system then applies, to the k payload symbols, a (n, k) polar code generator matrix derived from k rows of the ┌ log 2  n┐-times Kronecker product of the matrix 
                   (         1       0           1       1         )           
to produce n encoded symbols and stores each of the encoded payload symbols in a solid-state disk of the RAID group.

BACKGROUND

Some storage systems have configurations that use fast solid-state disks(e.g., Flash drives), to store frequently-accessed data in order toincrease access speed to such frequently-accessed data nominally storedon disk drives. FAST Cache, a technology made available by EMC Corp., isan example of such a configuration. The solid-state disks are typicallyplaced in a path between a hard disk and a DRAM cache in the storagesystem. In this way, the storage system can also use the solid-statedisks as a place to offload data from the DRAM cache with a reducedpenalty in access time from moving such data to the hard disk.

While such configurations achieve high performance, Redundant Arrays ofIndependent Disks (RAID) are commonly used to provide high reliabilityaccess to large amounts of data storage. There are several types ofRAID, ranging from simpler RAID 0 and RAID 1 (data mirroring) throughmore complex RAID 5 and RAID 6. RAID 5 encodes stripes of data across aplurality of disks with one disk (which rotates from stripe to stripe)storing a parity redundancy code for that stripe, which allows storeddata to be recovered even in the event of a disk failure. This paritycode involves performing a compound exclusive-or (XOR) operation oncorresponding blocks on the different disks. RAID 6 employs a similarapproach, but using two redundancy disks, allowing stored data to berecovered even in the event of two disk failures. There are several waysof calculating the values stored on the redundancy disks for RAID 6,such as even-odd parity (which involved storing row parity on one diskand diagonal parity on another disk) and Reed-Solomon encoding.

SUMMARY

In some cases, the reliability of high performance storage systems suchas those described above is improved with using a RAID configuration.While maximum distance separable codes such as Reed-Solomon codes foundin RAID 6 are efficient in that they minimize the number of parity disksrequired for a given level of redundancy, encoding and decoding of suchcodes is typically too complex in high-performance storage systems suchas FAST cache. Rather, for reliability, a conventional storage systememploying FAST cache arranges the solid-state disks in a simpler RAID 1array. In this way, the storage system maintains its high performancewhile improving reliability.

Unfortunately, there are deficiencies with the conventional storagesystem employing FAST cache. For example, RAID 1 results in a payloadcapacity that is only 50% of the physical disk space. Also, there isempirical evidence that the reliability of RAID 1 is not sufficient forsystems employing such high-performance disks.

In contrast to the conventional storage system employing FAST cachewhich uses a relatively unreliable, high-cost redundancy scheme, animproved technique applies polar codes to storage data to improve thereliability of a storage system that uses high-performance solid-statedisks as part of a RAID group for storing frequently-accessed data.Along these lines, a high-performance storage system having nsolid-state disks assigns k of those disks as payload disks. The storagesystem partitions the payload data into a data vector that has k datasymbols. The storage system then applies, to the k payload symbols, a(n, k) polar code with generator matrix derived from k rows of the ┌log₂n┐-times Kronecker product of the matrix

$\quad\begin{pmatrix}1 & 0 \\1 & 1\end{pmatrix}$with itself. For the case of systematic encoding, generator matrix isreduced to the canonical form containing a k×k identity submatrix onsome positions, which is used to produce n encoded symbols from koriginal ones, and stores each of the encoded payload symbols in asolid-state disk of the RAID group.

Advantageously, the improved technique involves a reduced complexityencoding method for the high-performance disks, while still using fewerparity disks than simple codes like RAID-1. Decoding and partial stripeupdate operations on the encoded data also have reduced complexity. Thereduced number of parity disks involved in using polar codes stems fromthe fact that polar codes can achieve the theoretical capacity of abinary input output symmetric memoryless channel. By splitting such achannel into n subchannels, it can be shown that, when the capacity ofthe channel is k/n, k of those subchannels will be noise-free for largevalues of n. Thus, encoding with polar codes requires fewer extrasubchannels, or in the case of storage, parity disks. Further, by usinga systematic generalized concatenated code (GCC) formulation of polarcodes, the encoding complexity is further reduced by approximately afactor of 2 (for the case of high-rate codes) compared to other encodingalgorithms for polar codes.

One embodiment of the improved technique is directed to a method ofreliably storing data within a storage system having high-performancesolid-state disks arranged as part of a RAID group having n solid-statedisks of which k solid-state disks are payload disks. The methodincludes partitioning the payload data into a data vector that includesk data symbols. The method further includes applying an (n, k) polarcode generator matrix to the payload vector to produce a code vectorthat includes n encoded symbols, the (n, k) polar code generator matrixincluding exactly k rows of an n×n matrix that is derived from ┌log₂n┐-times Kronecker product of a 2×2 polar seed matrix with itself. Themethod further includes storing each of the n encoded symbols of thecodeword in a solid-state disk of the RAID group.

Additionally, some embodiments of the improved technique are directed toan apparatus constructed and arranged to reliably store data within astorage system having high-performance solid-state disks arranged aspart of a RAID group having n solid-state disks of which k solid-statedisks are payload disks. The apparatus includes a network interface,memory, and a controller including controlling circuitry constructed andarranged to carry out the method of reliably storing data within astorage system having high-performance solid-state disks arranged aspart of a RAID group having n solid-state disks of which k solid-statedisks are payload disks.

Furthermore, some embodiments of the improved technique are directed toa computer program product having a non-transitory computer readablestorage medium which stores code including a set of instructions tocarry the method of reliably storing data within a storage system havinghigh-performance solid-state disks arranged as part of a RAID grouphaving n solid-state disks of which k solid-state disks are payloaddisks.

BRIEF DESCRIPTION OF THE DRAWING

The foregoing and other objects, features and advantages will beapparent from the following description of particular embodiments of theinvention, as illustrated in the accompanying figures in which likereference characters refer to the same parts throughout the differentviews.

FIG. 1a is a block diagram illustrating an example electronicenvironment for carrying out the improved technique.

FIG. 1b is a block diagram illustrating an example electronicenvironment for carrying out the improved technique.

FIG. 2 is a block diagram illustrating an example storage system withinthe electronic environment shown in FIG. 1a and FIG. 1 b.

FIG. 3 is a flow chart illustrating an example method of carrying outthe improved technique within the electronic environment shown in FIG.1a and FIG. 1 b.

FIG. 4 is a block diagram illustrating an example encoding scheme usinga polar code within the electronic environment shown in FIG. 1a and FIG.1 b.

FIG. 5 is a block diagram illustrating an example encoding scheme usinga systematic polar code within the electronic environment shown in FIG.1a and FIG. 1 b.

FIG. 6 is a flow chart illustrating an example of decoding using GCCscheme within the electronic environment shown in FIG. 1a and FIG. 1 b.

FIG. 7 is a flow chart illustrating an example of decoding usingGaussian elimination within the electronic environment shown in FIG. 1aand FIG. 1 b.

FIG. 8 is a flow chart illustrating an example method of updatinginformation symbols and corresponding check symbols of a codeword of asystematic polar code within the electronic environment shown in FIG. 1aand FIG. 1 b.

DETAILED DESCRIPTION

An improved technique applies polar codes to storage data to improve thereliability of a storage system that uses high-performance solid-statedisks as part of a RAID group for storing frequently-accessed data.Along these lines, a RAID group having n solid-state disks assigns k ofthose disks as payload disks. The storage system partitions the payloaddata into a data vector that has k data symbols. The storage system thenapplies, to the k payload symbols, a (n, k) polar code generator matrixderived from k rows of the ┌log₂ n┐-times Kronecker product of thematrix

$\quad\begin{pmatrix}1 & 0 \\1 & 1\end{pmatrix}$with itself to produce n encoded symbols and stores each of the encodedpayload symbols in a solid-state disk of the RAID group.

Advantageously, the improved technique involves a reduced complexityencoding for the high-performance disks, while still using fewer paritydisks than simple codes like RAID-1. Decoding and partial stripe updateoperations on the encoded data also have reduced complexity. The reducednumber of parity disks involved in using polar codes stems from the factthat polar codes can achieve the theoretical capacity of a binary inputoutput symmetric memoryless channel. By splitting such a channel into nsubchannels, it can be shown that, when the capacity of the channel isk/n, k of those subchannels will be noise-free for large values of n. Inthis case the number of parity disks, which is equal to the number offrozen subchannels, is minimal possible for reliable system with kpayload disks. Further, by using a systematic encoding algorithmdeveloped for generalized concatenated code (GCC) formulation of polarcodes, the encoding complexity is reduced by approximately a factor of 2(for the case of high-rate codes) compared to other encoding algorithmsfor polar codes.

FIG. 1a illustrates an electronic environment 10 for carrying out theimproved technique. Electronic environment 10 includes applicationserver 12 and storage system 14.

Application server 12 is configured to store data in storage system 14.Application server 12 is a server system. In some arrangements, however,application server 12 is a desktop personal computer, a laptop personalcomputer, a tablet computer, a smart phone, or any other electronicdevice that is enabled to store data in storage system 14.

Storage system 14 is configured to store data 24 from application server12 in disk array 20. Storage system 12 is further configured to storefrequently-accessed data 26 in solid-state disk array 19 and apply polarcodes for encoding and decoding data and updating partial stripes.Storage system 14 includes solid-state disk array 19, disk array 21, andDRAM cache 22.

Disk array 21 includes disks 20(1), 20(2), 20(3), . . . , 20(r). Eachdisk of disk array 21 is a hard disk drive such a magnetic disk drive,although other types of disks are possible (e.g., slow Flash, opticaldisk). In some arrangements, disk array 21 is a RAID.

Solid-state disk array 19 takes the form of a set of flash drives 18(1),. . . , 18(k−2), 18(k−1), 18(k), 18(k+1), 18(k+2), . . . , 18(n).

DRAM cache 22 is configured to provide very fast access to a smallsubset of data 24. DRAM cache 22 is also configured to send data tosolid-state disk array 19 and/or disk array 21 when that data is nolonger in sufficiently active use by storage system 14.

During operation, application server 12 sends data 24 to disk array 21for storage. Over time, storage system recognizes data 26 that has beenfrequently accessed by application server 12. Storage system 14 thenmoves data 26 to solid-state disk array 19 for fast access.

It should be understood that storage system 14 splits data 26 into a setof k symbols to be stored among the n solid-state disks of array 19.Each symbol includes a set of characters chosen from a fixed alphabetwhich represents a portion of data 26. A code includes a lexicon ofcodewords of length n consisting of such characters, and as such ischaracterized by a minimum distance between distinct codewords of thelexicon. For the purposes of the discussion below, the alphabet of thecodes are taken from the set {0,1}.

It should be understood that the disks within a storage system arecharacterized by failure probability. The linear transformation Oven by

, where

s represents s-times Kronecker product of a matrix,

${F = \begin{pmatrix}1 & 0 \\1 & 1\end{pmatrix}},$and s=┌log₂ n┐, induces n subchannels, which are characterized bysubstantially different erasure probabilities. Let A be the set ofsubchannels with the smallest erasure probability. The non-systematicgenerator matrix G′ of a polar code is obtained by taking the rows of

with indices in A. By applying elementary row operations to matrix G′,it is possible to obtain a systematic generator matrix G for the samecode, which contains an identity submatrix in columns given by A. Suchsystematic generator matrix is advantageous from the application pointof view, since it enables one to partition the codeword into informationsymbols, which correspond to the payload data, and parity symbols, whichare used to recover the payload data if some of the disks fail.

It should further be understood that the arrangement of the solid-statedisks of array 19 are shown as in FIG. 1a for simplicity and do notnecessarily represent the particular placement of payload and paritydata within array 19. For the purpose of discussion below, the payloaddata has indices taken from a set A, which is a set of k indices takenfrom the set {1 . . . n} and corresponding to the subchannels with thelowest probabilities of decoding failure.

FIG. 1b illustrates electronic environment 10 after the application of a(n, k) polar code to payload data 30. The (n, k) polar code has agenerator matrix 34 that consists of k rows of the matrix

, where

s represents s-times Kronecker product of a matrix with itself

$F = \begin{pmatrix}1 & 0 \\1 & 1\end{pmatrix}$and s=┌log₂ n┐. The k rows are those rows having indices in A. Forexample, a (8,5) polar code generator matrix takes the form

$G^{\prime} = {\begin{pmatrix}1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 \\1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\end{pmatrix}.}$In this case, the set A={1, 2, 3, 5, 7}.

Storage system 14 applies a k×n generator matrix in canonical form ofpolar code 34 such as

$G = {{MG}^{\prime} = \begin{pmatrix}1 & 0 & 0 & 1 & 0 & 1 & 0 & 1 \\0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 1 & 1\end{pmatrix}}$to a 1×k row vector of payload data 30 to obtain a set of n encodedsymbols containing n−k 32 check symbols on the positions {1 . . . n}\A.Storage system 14 stores encoded symbols 30 and 32 in solid-state diskarray 19.

It should be understood that the formulation for deriving encodedsymbols 30 and 32 as described above does not result in the mostefficient storage scheme. Details of alternative schemes for applyingpolar codes to data 26 are described below with respect to FIGS. 4 and5.

Further details of storage system 14 are described below with respect toFIG. 2.

FIG. 2 illustrates details of an example storage system 14 (see FIG. 1a). Storage system 14 includes controller 40, which in turn includesprocessor 44 and memory 46, and network interface 42. It should beunderstood that storage system 14 also includes other elements asdescribed above with respect to FIG. 1a and FIG. 1 b.

Network interface 42 takes the form of an Ethernet card; in somearrangements, network interface 42 takes other forms including awireless receiver and a token ring card.

Memory 46 is configured to store code 48 that contains instructionsconfigured to cause the processor to carry out the improved technique.Memory 46 generally takes the form of, e.g., random access memory, flashmemory or a non-volatile memory.

Processor 44 takes the form of, but is not limited to, Intel orAMD-based MPUs, and can include a single or multi-cores each runningsingle or multiple threads. In some arrangements, processor 44 is one ofmultiple processors working together.

FIG. 3 illustrates an example method 50 of reliably storing data withina storage system having high-performance solid-state disks arranged aspart of a RAID group having n solid-state disks of which k solid-statedisks are payload disks, including steps 52, 54 and 56. In step 52, thepayload data is partitioned into a vector that includes k data symbols.In step 54, an (n, k) polar code generator matrix is applied to thepayload vector to produce a vector that includes n encoded symbols, the(n, k) polar code generator matrix including exactly k rows of an n×nmatrix that is derived from ┌log₂ n┐times Kronecker product of a 2×2polar seed matrix F with itself. In step 56, each of the n encodedsymbols of obtained vector is stored in a solid-state disk of the RAIDgroup.

Further details of applying polar codes to encode data 26 are discussedbelow with respect to FIGS. 4 and 5.

FIG. 4 illustrates an example procedure for encoding data 26 (see FIG.1a ). This procedure corresponds to a non-systematic encoding of data 26using a generalized concatenated code (GCC) representation of a (16,10)polar code. An advantage of such a representation is a reducedcomplexity of encoding with respect to a direct representation of apolar code as described above.

A GCC representation of a (n, k) polar code having a generator matrix Ginvolves decomposing a vector of the k payload data symbols 30 into vsubvectors, each of length k_(i), 1≦i≦v, such that Σ_(1≦i≦v) k_(i)=k.

For example, consider a vector of data 30 arranged as illustrated inFIG. 4. Processor 44 (see FIG. 2) splits the vector of data 30 having 10elements into v=4 subvectors such that k₁=1, k₂=2, k₃=3, and k₄=4.Processor 44 then arranges each subvector into a row of an array ofintermediate encoded symbols 62.

It should be understood that there are several ways of splitting thevector of data 30 into subvectors. For example, processor 44 couldarrange vector of data 30 into 2 rows, or 8 rows. If the number of rowsis v=2^(l), then the decomposition is said to be of order l.

A GCC decomposition of order l of G′ follows from the identity

=

. That is, a GCC can be split into a set of (N,k_(i)) outer codes

_(i), 1≦i≦v and N=2^(s-l), and (v,v−i+1) nested inner codes

_(i). The outer codes

_(i) and the inner codes

_(i) are each examples of polar codes.

Each outer code

_(i) operates on a subvector of data 30 within a row of the array ofintermediate symbols 62 to produce the rest of the elements of that row.For an l=2 decomposition, the resulting array of intermediate symbols 62has dimension 4×4. In the array of intermediate symbols, the first rowhas one element from the vector of data 30, so that the outer codegenerator matrix for that row needs to produce three additional entries.The outer code

₁ in this case has 1×4 generator matrix, which contains as submatrix thefollowing check symbols generator matrix B⁽¹⁾=(1 1 1); this implies thatthe subsequent elements of the first row are equal to the first element,as illustrated in FIG. 4. Outer code

₂ has check symbols generator matrix

${\mathcal{B}^{(2)} = \begin{pmatrix}1 & 0 \\1 & 1\end{pmatrix}};$outer code

₃ has check symbols generator matrix

$\mathcal{B}^{(3)} = {\begin{pmatrix}1 \\1 \\1\end{pmatrix}.}$Note that, because the fourth row is filled with data, there is no outercode needed there.

It should be understood that the lengths of the subvectors, i.e.dimensions of outer codes k_(i), may not be arranged in ascending orderafter decomposition of polar code. In such a case, generator matrix forthe first inner code may not form a lower-triangular matrix. Such a lowtriangular matrix is desirable in the case of systematic encoding, aswill be seen with respect to FIG. 5 below. In order to ensure such astructure, processor 44 would apply to row- and column-swapping matrixto the generator matrix of the first inner code. It induces ascendingorder of dimensions of outer codes.

Once the processor fully forms the array of intermediate encoded symbols62, processor 44 then applies a v×v transposed generator matrix of thefirst inner code 64 to the array of intermediate encoded symbols 62 toproduce the n=Nv final encoded symbols 66 stored in a v×N array.

Inner code

₁ is generated by a permutation matrix multiplied by

so that the transposed generator matrix 64 V⁽¹⁾ of inner code

₁ is upper triangular. In the case illustrated in FIG. 4,

$V^{(1)} = {\begin{pmatrix}1 & 1 & 1 & 1 \\0 & 1 & 0 & 1 \\0 & 0 & 1 & 1 \\0 & 0 & 0 & 1\end{pmatrix}.}$The result of multiplying the array 62 by V⁽¹⁾ 64 results in the arrayof final encoded symbols 66 illustrated in FIG. 4. Processor 44 thenstores each element of array 66 in storage array 19 (see FIG. 1a ) tocomplete this encoding scheme using a GCC representation of a polarcode.

It should be understood that the final encoded symbols 66 are differentfrom the data symbols 30. In this case, a reading of the payload datamay require more complexity than one in the case of systematic encoding.In a systematic code, the encoded symbols include the original datasymbols in addition to some check symbols. Such a systematic encodingfor polar codes is discussed below with respect to FIG. 5.

FIG. 5 illustrates an example procedure for encoding data 26 using asystematic polar code. In this procedure, an array of final encodedsymbols 70 is constructed in a similar manner as the array ofintermediate symbols 66, with the data symbols 30 being arranged in eachrow according to the values of the k_(i). The procedure involvesgenerating check symbols in array 70.

Consider a representation of the i^(th) outer code generator matrix inthe canonical form (I B^((i))), where I is the k_(i)×k_(i) identitymatrix, and B^((i)) is a k_(i)×(N−k_(i)) check symbols generator matrix.By considering the construction of the array intermediate symbols andarray of final encoded symbols from the above example, one can show thatprocessor 44 may generate the check symbols of the array of finalencoded symbols 70 according to the following expression:c _(i,k) _(i) _(+1 . . . N) =V ⁽¹⁾ ⁻¹ _(i,i . . . v) c_(i . . . v,1 . . . k) _(i) B ^((i)) −V ⁽¹⁾ ⁻¹ _(i,i+1 . . . v) c_(i+1 . . . v,h) _(i) _(+1 . . . N) , i=v . . . 1,  (*)where c_(i,j) is the element of the array of final encoded symbols 70 inthe i^(th) row and j^(th) column, c_(i) ₁ _(. . . i) ₂ _(,j) ₁_(. . . j) ₂ represents a subarray that includes the i₁ ^(th) throughthe i₂ ^(th) rows and the j₁ ^(th) through the j₂ ^(th) columns of array70, V⁽¹⁾ ⁻¹ represents the inverse of V⁽¹⁾, and V⁽¹⁾ _(i,j) ₁ _(. . . j)₂ ⁻¹ represents the i^(th) row and i₁ ^(th) through the j₂ ^(th) columnsof, V⁽¹⁾ ⁻¹ .

It should be understood that Equation (*) represent a recursive set ofexpressions to be evaluated in the order specified.

For the above example illustrated in FIG. 4, with the (16,10) polar codegenerator matrix 34, the check symbols as illustrated in FIG. 5 aregenerated from the data symbols 30 as follows:c _(3,4) =c _(3,1) +c _(3,2) +c _(3,3) +c _(4,1) +c _(4,2) +c _(4,3) −c_(4,4),c _(2,3) =c _(2,1) +c _(2,2) +c _(4,1) +c _(4,2) −c _(4,3),c _(2,4) =c _(2,2) +c _(4,2) −c _(4,4),c _(1,2) =c _(1,1) +c _(2,1) −c _(2,2) +c _(3,1) −c _(3,2) +c _(4,1) −c_(4,2),c _(1,3) =c _(1,1) +c _(2,1) −c _(2,3) +c _(3,1) −c _(3,3) +c _(4,1) −c_(4,3),c _(1,4) =c _(1,1) +c _(2,1) −c _(2,4) +c _(3,1) −c _(3,4) +c _(4,1) −c_(4,4).

With systematic encoding, processor 44 needs only to generate n−k checksymbols 32 (see FIG. 1b ). The form of the polar code ensures that theoperations involved in such check symbol generation are simple, asillustrated above.

Two decoding schemes, i.e. reconstruction of the payload data in thepresence of erasures, which correspond to disk failures, are discussedbelow. The first scheme employs GCC representation of polar codes andthe second scheme is based on Gaussian elimination

FIG. 6 illustrates a process 78 of decoding encoded symbols 30 and 32(see FIG. 1b ) that were generated according to the procedure describedabove to produce final encoded symbol array 70. Process 78 includessteps 80, 82, 84, 86, 88, and 90. Advantageously, process 78 involves areduced set of decoding operations as compared to the prior art.

In step 80, the smallest row index h in encoded data matrix having anerasure is found. A new matrix c_(h) is then formed from final v−h+1rows of final encoded matrix 70.

In step 82, an index i←h is set, and a loop over another index j←1 . . .N is initialized. A GCC

_(h) is constructed that includes a set of inner codes

_(h,i) and a set of outer) codes

_(i). Inner code

_(h,i) has a generator matrix equal to V_(i . . . v,h . . . v) ⁽¹⁾,while outer code

_(i) has a generator matrix as described above, where i=h . . . v.

In step 84, a decode operation is performed on columns of c_(h) usinginner code

_(h,i) to produce intermediate symbol a_(i,j). At this point, an erasuremay be revealed. If no erasure is revealed, then the process proceeds tostep 88, otherwise process proceeds to step 86.

In step 86, a decode operation is performed on intermediate symbolsa_(i,j . . . N) in outer code

_(i) to recover erased symbols. If this fails, then index h isdecremented by one, the index i←h is set, and the loop over the indexj←1 . . . N is reinitialized. If no further erasure is revealed, thenthe process proceeds to step 88.

In step 88, a new loop over the index j←h . . . i is initialized, with jbeing incremented at each step of the loop. The symbols are decrementedaccording to c_(j,1 . . . N)←c_(j,1 . . . N)−a_(i,1 . . . N) whenV_(j,i) ⁽¹⁾=1.

In step 90, a decoding operation is performed on the adjusted codewordsymbols in inner code

_(h,i+1). Upon completion of this step, the original data 30 and 32 arerecovered so long as there were no failures.

Gaussian elimination represents the maximum likelihood erasure decodingscheme for any linear code. Further details of such a scheme isdescribed below with respect to FIG. 7.

FIG. 7 illustrates a process 100 of recovering erased informationsymbols x_(A∩ε) for given information positions A and erasure positionsε using a Gaussian elimination scheme in a code having generator matrixG. Process 100 includes steps 102, 104, 106, 108, and 110. (NB a vectorof n symbols x has a subvector x_(S), where S⊂{1 . . . n}.)

In step 102, a subset of non-erased symbols x_({1 . . . n}\ε) is formed.

In step 104, a subsets of erased and non-erased information symbolsx_(A∩ε) and x_(A\ξ) are formed.

In step 106, a submatrix of generator matrix G_(A\ε,{1 . . . n}\ε) isformed.

In step 108, a submatrix of generator matrix G_(A∩ε,{1 . . . n}\ε) isformed.

In step 110, the following equation is formed and solved using Gaussianelimination for x_(A∩ε):x _({1 . . . n}\ε) −x _(A\ε) G _(A\ε,{1 . . . n}\ε) =x _(A∩ε) G_(A∩ε,{1 . . . n}\ε).

In this way, processor 44 can recover erased information symbols.

In some arrangements, processor 44 is configured to perform a partialstripe update operation. This involves changing information symbolswithin a codeword and corresponding check symbols. Details of partialstripe updating within the systematic encoding described above aredescribed below with respect to FIG. 8.

FIG. 8 illustrates a method 120 of performing a partial stripe update,including steps 122, 124, 126, 128, 130 and 132. In step 122, a set ofinformation symbols to be updated is obtained after generating the v×Nmatrix of final encoded symbols. In step 124, a difference is producedbetween a current value of the information symbol and an updated valueof the information symbol for each information symbol of the set ofinformation symbols to be updated. In step 126, check symbols aregenerated for an array having elements that include the differencebetween a current value of the information symbol and an updated valueof the information symbol.

Within the systematic polar coding described above, there are twoschemes for achieving such partial stripe updating: by using a generatormatrix, and by encoding using GCC representation. To the first effects,in step 128, a generator submatrix is formed from rows of the (n, k)polar code generator matrix corresponding to indices of the set ofinformation symbols to be updated. In step 130, the array of informationsymbols to be updated is multiplied by a generator submatrix. In step132, an array of information symbols to be updated by Equation (*) isencoded; values of check symbols are read from disks, difference betweenthese check symbols and values of check symbols from encoding arecomputed, and this difference is written on the disks.

The second scheme for partial stripe updating includes encoding of anarray of information symbols to be updated using Equation (*).Computations are performed only for symbols, which depend on theinformation symbols, being updated.

The result of multiplication by generator submatrix or of encoding usingEquation (*) is values of check symbols. The difference between them andold values of check symbols should be written to the disks with newvalues of information symbols being updated.

While various embodiments of the invention have been particularly shownand described, it will be understood by those skilled in the art thatvarious changes in form and details may be made therein withoutdeparting from the spirit and scope of the invention as defined by theappended claims.

For example, it should be understood that, while the examples describedabove were directed to arrays of solid-state disks, the improvedtechnique applied to arrays of any other type of disk (e.g., magnetic)arrays.

Further, it should be understood that some embodiments are directed tostorage system 14, which is constructed and arranged to reliably storedata within a storage system having high-performance, solid-state disksarranged as part of a RAID group having n solid-state disks of which ksolid-state disks are payload disks. Some embodiments are directed to aprocess of reliably storing data within a storage system havinghigh-performance, solid-state disks arranged as part of a RAID grouphaving n solid-state disks of which k solid-state disks are payloaddisks. Also, some embodiments are directed to a computer program productwhich enables computer logic to reliably store data within a storagesystem having high-performance, solid-state disks arranged as part of aRAID group having n solid-state disks of which k solid-state disks arepayload disks.

In some arrangements, storage system 14 is implemented by a set ofprocessors or other types of control/processing circuitry runningsoftware. In such arrangements, the software instructions can bedelivered, within storage system 14, in the form of a computer programproduct 140 (see FIG. 2), each computer program product having acomputer readable storage medium which stores the instructions in anon-volatile manner. Alternative examples of suitable computer readablestorage media include tangible articles of manufacture and apparatussuch as CD-ROM, flash memory, disk memory, tape memory, and the like.

What is claimed is:
 1. A method of reliably storing data within astorage system having high-performance, solid-state disks arranged aspart of a RAID group having n solid-state disks of which k solid-statedisks are payload disks, k being less than n, the method comprising:partitioning the payload data into a data vector that includes k datasymbols; applying an (n, k) polar code generator matrix to the payloadvector to produce a codeword that includes n encoded symbols, the (n, k)polar code generator matrix including exactly k rows of an n×n matrixthat is derived from ┌ log₂ n┐-times Kronecker product of a 2×2 polarseed matrix with itself, the k rows having indices equal to the indicesof the k payload symbols of the data vector; and storing each of the nencoded symbols of the codeword in a solid-state disk of the RAID group.2. The method of claim 1, wherein n=2^(s) for a positive integer s;wherein applying the (n, k) polar code generator matrix to the payloadvector includes: for a positive integer l, splitting the payload datavector into a set of v=2^(l) payload data subvectors, the i^(th) payloaddata subvector for each index i satisfying 1≦i≦v having k_(i) payloadsymbols for some positive integer k_(i) satisfying Σ_(i=1) ^(v) k_(i)=k,generating, from the (n, k) polar code generator matrix, a set of vouter code generator matrices, the i^(th) outer code generator matrix ofthe set of v outer code generator matrices for each index i satisfying1≦i≦v being a (N, k_(i)) polar code generator matrix including exactlyk_(i) rows of the N×N matrix derived from (s−l)-times Kronecker productof the 2×2 polar seed matrix with itself, N being equal to 2^(s−l),generating, from the (n, k) polar code generator matrix, (v, v−i+1)inner code generator matrix for each index i satisfying 1≦i≦v being a(v, v−i+1) polar code generator matrix including the (v−i+1) rows of thev×v matrix derived from l-times Kronecker product of the 2×2 polar seedmatrix with itself, where v inner codes are nested ones, producing a v×Nmatrix of intermediate symbols from products of each payload datasubvector of the set of v payload data subvectors by a correspondingouter code generator matrix of the set of v outer code generatormatrices, and generating a v×N matrix of final encoded symbols from aproduct of the v×N matrix of intermediate symbols and the first innercode generator matrix of the set of v inner codes generator matrices,the n encoded symbols being the elements of the v×N matrix of finalencoded symbols.
 3. The method of claim 2, wherein generating the set ofv outer codes generator matrices includes: performing a bit reversaloperation on i to produce a bit-reversed index i*; for each index jsatisfying 1≦j≦N: extracting a row of the N×N matrix derived from(s−l)-times Kronecker product of the 2×2 polar seed matrix having indexj* to produce the j^(th) row of the i*^(th) outer code generator of theset of v outer code generators.
 4. The method of claim 2, whereingenerating the set of v outer code generators includes: producing a setof indices by which the set of v outer code generator matrices arearranged in an ascending order by value of k_(i); and wherein generatingthe v×v inner code generator includes: multiplying a row swapping matrixand the v×v matrix derived from l-times Kronecker product of the 2×2polar seed matrix with itself, the row swapping matrix arranging therows and columns of the v×v inner code matrix according to the set ofindices, the v×v inner code generator being an low triangular matrix. 5.The method of claim 4, wherein the set of v outer code generatormatrices represents systematic outer codes; and wherein generating theset of v outer code generator matrices further includes: for each indexi satisfying 1≦i≦v: concatenating a k_(i)×k_(i) identity matrix with ak_(i)×(N−k_(i)) check symbols generation matrix B^((i)).
 6. The methodof claim 5, wherein the 2×2 polar seed matrix is ${F = \begin{pmatrix}1 & 0 \\1 & 1\end{pmatrix}},$ diagonal elements of the first inner code generatormatrix having a value of unity, this matrix is low triangular one;wherein generating the v×N matrix of final encoded symbols, i.e.codeword, includes: for each index i satisfying 1≦i≦v: for each index jsatisfying 1≦j≦k_(i): setting the (i, j)^(th) element of the v×N matrixof final encoded symbols equal to the j^(th) element of the i^(th)payload data subvector; and generating an inverse of the first innercode transposed generator matrix, for each index i satisfying v≧i≧1, andin order of decreasing values of i: forming a first 1×(v−i+1) subarrayof the inverse of the first inner code transposed generator matrix fromthe i^(th) row and final v−i+1 columns, forming a first (v−i+1)×k_(i)subarray of the v×N matrix of final encoded symbols from the final v−i+1rows and first k_(i) columns of the v×N matrix of final encoded symbols,multiplying the first 1×(v−i+1) subarray of the inverse of the firstinner code transposed generator matrix and the first (v−i+1)×k_(i)subarray of the v×N matrix of final encoded symbols to form a 1×k_(i)intermediate term, multiplying the 1×k_(i) intermediate term and thek_(i)×(N−k_(i)) check symbol generation matrix B^((i)) to form a first1×(N−k_(i)) product term, forming a second 1×(v−i) subarray of theinverse of the first inner code transposed generator matrix from thei^(th) row and final v−i columns of the inverse of the first inner codetransposed generator matrix, forming a second (v−i)×(N−k_(i)) subarrayof the v×N matrix of final encoded symbols from the final v−i rows andlast N−k_(i) columns, multiplying the second 1×(v−i) subarray of theinverse of the first inner code transposed generator matrix and thesecond (v−i)×(N−k_(i)) subarray of the v×N matrix of final encodedsymbols to form the second 1×(N−k_(i)) product term, and setting anarray including the final N−k_(i) elements of the i^(th) row of the v×Nmatrix of final encoded symbols equal to a difference between a thefirst 1×(N−k_(i)) product term and the second 1×(N−k_(i)) product term.7. The method of claim 2: after generating the v×N matrix of finalencoded symbols, extracting a set of erased encoded symbols; producingan initial erasure index h based on the first row of the v×N matrix offinal encoded symbols that has an erased final encoded symbol; formingan outer subset of the final v−h+1 outer code generator matrices of theset of v outer code generator matrices; forming a set of v−h+1 innershortened subcode generator matrices, the i^(th) inner shortened subcodegenerator matrix of the set of v−h+1 inner shortened subcode generatorsubmatrices being a (v−i+1)×(v−h+1) matrix that includes the final v−i+1rows and the final v−h+1 columns of the first inner code generatormatrix, i satisfying h≦i≦v; for each index i satisfying h≦i≦v:performing a decoding operation on a (v−h+1)×N submatrix given by finalv−h+1 rows of the v×N matrix of final encoded symbols in the i^(th)inner shortened subcode generator matrices of the set of v−h+1 innershortened subcode generator matrices to produce a 1×N array ofintermediate symbols, and performing the decoding operation on the 1×Narray of intermediate symbols in the i^(th) outer code of the set of vouter codes to produce a set of decoded intermediate symbols, andadjusting the v×N matrix of final encoded symbols by subtraction ofdecoded 1×N of intermediate symbols from rows of the v×N matrix of finalencoded symbols given elements of the h^(th) inner shortened subcodegenerator matrix, and performing a decoding operation on a (v−h+1)×Nsubmatrix given by final v−h+1 rows of the v×N matrix of final encodedsymbols in the (i+1)^(th) inner shortened subcode generator matrix ofthe set of v−h+1.
 8. The method of claim 2, further comprising:performing a decoding operation on the v×N matrix of final encodedsymbols using k×n polar code generator matrix, wherein performing thedecoding operation includes: forming a first generator submatrix fromrows of the polar code generator matrix having indices that correspondto the indices of the set of erased information symbols of final encodedsymbols, forming a second generator submatrix from rows of the polarcode generator matrix having indices that correspond to the indices ofthe set of unerased information symbols of final encoded symbols,multiplying a first array that includes unerased information symbols offinal encoded symbols and the second generator submatrix to produce aproduct term, forming a second array that includes unerased finalencoded symbols, and using a Gaussian elimination process to generate anarray of erased information symbols, i.e. payload symbol, from i) adifference between the second array and the product term and ii) thefirst generator submatrix.
 9. The method of claim 2, further comprising:obtaining a set of information symbols to be updated; for eachinformation symbol of the set of information symbols to be updated,producing a difference between a current value of the information symboland an updated value of the information symbol; and generating checksymbols for an array having elements that include the difference betweena current value of the information symbol and an updated value of theinformation symbol; and producing updated values of generated checksymbols as the difference between its values and current values of checksymbols.
 10. The method of claim 9, wherein generating the check symbolsincludes: forming a generator submatrix from rows of the (n, k) polarcode generator matrix corresponding to indices of the set of informationsymbols to be updated, and multiplying an array of information symbolsto be updated by the generator submatrix.
 11. The method of claim 9,wherein producing the check symbols includes: encoding an array ofinformation symbols to obtain check symbols from encoding, reading thecurrent values of the check symbols from the disks, computing adifference between the current values of the check symbols and values ofthe check symbols from encoding, and writing the difference on thesolid-state disks.
 12. An electronic apparatus constructed and arrangedto reliably store data within a storage system having high-performance,solid-state disks arranged as part of a RAID group having n solid-statedisks of which k solid-state disks are payload disks, k being less thann, the apparatus comprising: a network interface; memory; and acontroller coupled to the memory, the controller including controllingcircuitry constructed and arranged to: partition the payload data into adata vector that includes k data symbols; apply an (n, k) polar codegenerator matrix to the payload vector to produce a codeword thatincludes n encoded symbols, the (n, k) polar code generator matrixincluding exactly k rows of an n×n matrix that is derived from ┌ log₂n┐-times Kronecker product of a 2×2 polar seed matrix with itself, the krows having indices equal to the indices of the k payload symbols of thedata vector; and store each of the n encoded symbols of the codeword ina solid-state disk of the RAID group.
 13. The apparatus of claim 12,wherein n=2^(s) for a positive integer s; wherein applying the (n, k)polar code generator matrix to the payload vector includes: for apositive integer l, splitting the payload data vector into a set ofv=2^(l) payload data subvectors, the i^(th) payload data subvector foreach index i satisfying 1≦i≦v having k_(i) payload symbols for somepositive integer k_(i) satisfying Σ_(i=1) ^(v) k_(i)=k, generating, fromthe (n, k) polar code generator matrix, a set of v outer code generatormatrices, the i^(th) outer code generator matrix of the set of v outercode generator matrices for each index i satisfying 1≦i≦v being a (N,k_(i)) polar code generator matrix including exactly k_(i) rows of theN×N matrix derived from (s−l)-times Kronecker product of the 2×2 polarseed matrix with itself, N being equal to 2^(s−l), generating, from the(n, k) polar code generator matrix, (v, v−i+1) inner codes generatormatrix for each index i satisfying 1≦i≦v being a (v, v−i+1) polar codegenerator matrix including the (v−i+1) rows of the v×v matrix derivedfrom l-times Kronecker product of the 2×2 polar seed matrix with itself,where v inner codes are nested ones, producing a v×N matrix ofintermediate symbols from products of each payload data subvector of theset of v payload data subvectors by a corresponding outer code generatormatrix of the set of v outer code generator matrices, and generating av×N matrix of final encoded symbols from a product of the v×N matrix ofintermediate symbols and the first inner code generator matrix of theset of v inner codes generator matrices, the n encoded symbols being theelements of the v×N matrix of final encoded symbols.
 14. The apparatusof claim 13, wherein generating the set of v outer codes generatormatrices includes: performing a bit reversal operation on i to produce abit-reversed index i*; for each index j satisfying 1≦j≦N: extracting arow of the N×N matrix derived from (s−l) Kronecker products of the 2×2polar seed matrix having index j* to produce the j^(th) row of thei*^(th) outer code generator of the set of v outer code generators. 15.The apparatus of claim 13, wherein generating the set of v outer codegenerators includes: producing a set of indices by which the set of vouter code generator matrices are arranged in an ascending order byvalue of k_(i); and wherein generating the v×v inner code generatorincludes: multiplying a row swapping matrix and the v×v matrix derivedfrom l-times Kronecker product of the 2×2 polar seed matrix with itself,the row swapping matrix arranging the rows of the v×v inner code matrixaccording to the set of indices, the v×v generator matrix of the firstinner code being an low triangular matrix.
 16. The apparatus of claim15, wherein the set of v outer code generator matrices representssystematic outer codes; and wherein generating the set of v outer codegenerator matrices further includes: for each index i satisfying 1≦i≦v:concatenating a k_(i)×k_(i) identity matrix with a k_(i)×(N−k_(i)) checksymbol generation matrix B^((i)).
 17. The apparatus of claim 16, whereinthe 2×2 polar seed matrix is ${F = \begin{pmatrix}1 & 0 \\1 & 1\end{pmatrix}},$ diagonal elements of the first inner code generatormatrix having a value of unity, this matrix is low triangular one;wherein generating the v×N matrix of final encoded symbols, i.e.codeword, includes: for each index i satisfying 1≦i≦v: for each index jsatisfying 1≦j≦k_(i): setting the (i, j)^(th) element of the v×N matrixof final encoded symbols equal to the j^(th) element of the i^(th)payload data subvector; and generating an inverse of the first innercode transposed generator matrix, for each index i satisfying v≧i≧1, andin order of decreasing values of i: forming a first 1×(v−i+1) subarrayof the inverse of the first inner code transposed generator matrix fromthe i^(th) row and final v−i+1 columns, forming a first (v−i+1)×k_(i)subarray of the v×N matrix of final encoded symbols from the final v−i+1rows and first k_(i) columns of the v×N matrix of final encoded symbols,multiplying the first 1×(v−i+1) subarray of the inverse of the firstinner code transposed generator matrix and the first (v−i+1)×k_(i)subarray of the v×N matrix of final encoded symbols to form a 1×k_(i)intermediate term, multiplying the 1×k_(i) intermediate term and thek_(i)×(N−k_(i)) check symbol generation matrix B^((i)) to form a first1×(N−k_(i)) product term, forming a second 1×(v−i) subarray of theinverse of the first inner code transposed generator matrix from thei^(th) row and final v−i columns of the inverse of the first inner codetransposed generator matrix, forming a second (v−i)×(N−k_(i)) subarrayof the v×N matrix of final encoded symbols from the final v−i rows andlast N−k_(i) columns, multiplying the second 1×(v−i) subarray of theinverse of the first inner code transposed generator matrix and thesecond (v−i)×(N−k_(i)) subarray of the v×N matrix of final encodedsymbols to form the second 1×(N−k_(i)) product term, and setting anarray including the final N−k_(i) elements of the i^(th) row of the v×Nmatrix of final encoded symbols equal to a difference between a thefirst 1×(N−k_(i)) product term and the second 1×(N−k_(i)) product term.18. The apparatus of claim 13, wherein the controlling circuitry isfurther constructed and arranged to: after generating the v×N matrix offinal encoded symbols, extracting a set of erased encoded symbols;producing an initial erasure index h based on the first row of the v×Nmatrix of final encoded symbols that has an erased final encoded symbol;forming an outer subset of the final v−h+1 outer code generator matricesof the set of v outer code generator matrices; forming a set of v−h+1inner shortened subcode generator matrices, the i^(th) inner shortenedsubcode generator matrix of the set of v−h+1 inner shortened subcodegenerator submatrices being a (v−i+1)×(v−h+1) matrix that includes thefinal v−i+1 rows and the final v−h+1 columns of the first inner codegenerator matrix, i satisfying h≦i≦v; for each index i satisfying h≦i≦v:performing a decoding operation on a (v−h+1)×N submatrix given by finalv−h+1 rows of the v×N matrix of final encoded symbols in the i^(th)inner shortened subcode generator matrices of the set of v−h+1 innershortened subcode generator matrices to produce a 1×N array ofintermediate symbols, and performing the decoding operation on the 1×Narray of intermediate symbols in the i^(th) outer code of the set of vouter codes to produce a set of decoded intermediate symbols, andadjusting the v×N matrix of final encoded symbols by subtraction ofdecoded 1×N of intermediate symbols from rows of the v×N matrix of finalencoded symbols given elements of the h^(th) inner shortened subcodegenerator matrix, and performing a decoding operation on a (v−h+1)×Nsubmatrix given by final v−h+1 rows of the v×N matrix of final encodedsymbols in the (i+1)^(th) inner shortened subcode generator matrix ofthe set of v−h+1.
 19. The apparatus of claim 13, wherein the controllingcircuitry is further constructed and arranged to: perform a decodingoperation on the v×N matrix of final encoded symbols using k×n polarcode generator matrix, wherein performing the decoding operationincludes: forming a first generator submatrix from rows of the polarcode generator matrix having indices that correspond to the indices ofthe set of erased information symbols of final encoded symbols, forminga second generator submatrix from rows of the polar code generatormatrix having indices that correspond to the indices of the set ofunerased information symbols of final encoded symbols, multiplying afirst array that includes unerased information symbols of final encodedsymbols and the second generator submatrix to produce a product term,forming a second array that includes unerased final encoded symbols, andusing a Gaussian elimination process to generate an array of erasedinformation symbols, i.e. payload symbol, from i) a difference betweenthe second array and the product term and ii) the first generatorsubmatrix.
 20. A computer program product having a non-transitory,computer-readable storage medium which stores code for reliably storingdata within a storage system having high-performance, solid-state disksarranged as part of a RAID group having n solid-state disks of which ksolid-state disks are payload disks, k being less than n, the codeincluding instructions which, when executed by a computer, causes thecomputer to: partition the data into a data vector that includes k datasymbols; apply an (n, k) polar code generator matrix to the payloadvector to produce a codeword that includes n encoded symbols, the (n, k)polar code generator matrix including exactly k rows of an n×n matrixthat is derived from ┌ log₂ n┐-times Kronecker products of a 2×2 polarseed matrix with itself, the k rows having indices equal to the indicesof the k payload symbols of the data vector; and store each of the nencoded symbols of the codeword in a solid-state disk of the RAID group.